The aim of the following exercises is to create an R script that takes care of wrangling the data for the analyses we want to perform in the upcoming sessions on R Markdown. Hence, your solutions for the tasks in this exercise should be written into an R script. This script should be named data_wrangling_gpc.R and stored in a folder called src (for source) in your project directory. As always, it is helpful to comment your code.
Note 1: The idea for this set of exercises is to use the tidyverse functions we have covered in the lecture part. Hence, the solutions will be tidyverse code. However, if you prefer base R, feel free to use that instead. Also, if the wrangling operations in this exercise are too basic for you, feel free to do or add some more advanced stuff. Be aware, however, that it will be easier to continue to work on the R Markdown exercises later on if you follow what we propose to do in the current exercise.
We will be working with the synthetic data set based on the data from the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany. If you have not already done so, please copy the the file ZA5667_v1-0-0_CSV_synthetic-data.csv from the workshop materials to a folder called data in your project folder.
Note 2: When you execute code from the solutions (esp. the bit for loading the data), you should make sure that your working directory is the one that contains the R script you working on (should be the src folder)
tidyverse packages for this.
gp_covid.
CSV file. You can use a function from the readr package for importing it.
corona_survey that contains the wrangled data. For this, we want to create a subset that only includes the following variables: sex, hzcy001a, hzcy002a, hzcy05a, hzcy048a, hzcy051a, hzcy052a, hzcy026a, hzcy084a, hzcy090a. We also want to rename all of these variables except for sex to (same order as before): risk_self, risk_surroundings, risk_infect_others, trust_government, trust_who, trust_scientists, obey_curfew, info_nat_pub_br, info_fb.
obey_curfew to 0.
mutate() in combination with recode() here.
NA for the entire data set.
na_if() function can be used here.
You can, of course, also combine all of the previous wrangling steps into one pipe (i.e., without the need of creating any intermediate objects).
R Markdown reports we are going to create in the following sessions, we also want to have a data set that only includes respondents who do not work in critical professions, such as the medical sector, the police force, etc. We can identify those respondents as they should have provided the answer with the code/value 4 to the obey_curfew item. The resulting new data object should be called corona_survey_noncrit.
filter() from the dplyr package.
After you have successfully created the data wrangling script that contains all of the steps listed above, make sure to save it (as data_wrangling_gpc.R) in the src folder of your project directory, then stage it, commit, and push the changes to GitHub (don’t forget to write a meaningful commit message, such as “Create data wrangling script”).
Note: You can find the full data_wrangling_gpc.R script that you are supposed to create here in the solutions folder of the workshop materials.